My project.
This post assumes that the reader has read
my last post.
This week I had less time to work on my project than the last ones, because it was kind of the last week of the semester for some disciplines. This was unfortunate from a point of view, but it s good in another, because now I ve finished most of my disciplines and can focus more. This is also the reason why the weekly report is slightly delayed.
Most of my work this week was continuing the investigation of the segfault caused by my implementation of allocation in lines. The most interesting thing I discovered came after
a suggestion from my menthor: to use
+RTS -DS to turn on the sanity checker. I runned the code with my allocation implementation and the sanity checker and got a segfault. I runned again only with the code to free memory in lines and I got also a segfault. Then I runned without all my patches, using sweep, and I got the segfault. So it seems that there is something wrong with sweep, and now I m investigating this new segfault.
I m using the
bernouilli program from
nofib to test, running with 148 as a parameter and, naturally,
+RTS -w -DS passed to the RunTime System. The output of gdb:
Current directory is /home/marcot/trabalho/livre/ghc/nofib/imaginary/bernouilli/
GNU gdb (GDB) 7.1-debian
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/marcot/trabalho/livre/ghc/nofib/imaginary/bernouilli/Main...done.
(gdb) r 148 +RTS -w -DS
Starting program: /home/marcot/trabalho/livre/ghc/nofib/imaginary/bernouilli/Main 148 +RTS -w -DS
[Thread debugging using libthread_db enabled]
Program received signal SIGSEGV, Segmentation fault.
0x00000000006309cb in LOOKS_LIKE_INFO_PTR_NOT_NULL (p=12297829382473034410) at includes/rts/storage/ClosureMacros.h:225
(gdb) where
#0 0x00000000006309cb in LOOKS_LIKE_INFO_PTR_NOT_NULL (p=12297829382473034410) at includes/rts/storage/ClosureMacros.h:225
#1 0x0000000000630a16 in LOOKS_LIKE_INFO_PTR (p=12297829382473034410) at includes/rts/storage/ClosureMacros.h:230
#2 0x0000000000630a4b in LOOKS_LIKE_CLOSURE_PTR (p=0x7ffff6c84062) at includes/rts/storage/ClosureMacros.h:235
#3 0x0000000000631427 in checkClosure (p=0x7ffff6853a08) at rts/sm/Sanity.c:320
#4 0x00000000006319ef in checkHeap (bd=0x7ffff68014c0) at rts/sm/Sanity.c:479
#5 0x000000000063222f in checkSanity (check_heap=rtsTrue) at rts/sm/Sanity.c:686
#6 0x000000000062dfdb in GarbageCollect (force_major_gc=rtsFalse, gc_type=0, cap=0x8d2ec0) at rts/sm/GC.c:768
#7 0x0000000000620431 in scheduleDoGC (cap=0x8d2ec0, task=0x8f5080, force_major=rtsFalse) at rts/Schedule.c:1420
#8 0x000000000061fa2c in schedule (initialCapability=0x8d2ec0, task=0x8f5080) at rts/Schedule.c:539
#9 0x0000000000620c77 in scheduleWaitThread (tso=0x7ffff6c80000, ret=0x0, cap=0x8d2ec0) at rts/Schedule.c:1902
#10 0x000000000065762b in rts_evalLazyIO (cap=0x8d2ec0, p=0x89d8b0, ret=0x0) at rts/RtsAPI.c:495
#11 0x000000000061d3db in real_main () at rts/RtsMain.c:66
#12 0x000000000061d4ca in hs_main (argc=5, argv=0x7fffffffe6d8, main_init=0x406558 <__stginit_ZCMain>, main_closure=0x89d8b0) at rts/RtsMain.c:115
#13 0x00007ffff6fbcabd in __libc_start_main (main=<value optimized out>, argc=<value optimized out>, ubp_av=<value optimized out>, init=<value optimized out>, fini=<value optimized out>, rtld_fini=<value optimized out>, stack_end=0x7fffffffe6c8) at libc-start.c:222
#14 0x0000000000403a69 in _start ()
I started the investigation with the most obvious test: to remove the call to
sweep() in
rts/sm/GC.c.
if (major_gc && oldest_gen->mark)
if (oldest_gen->compact)
compact(gct->scavenged_static_objects);
// else
// sweep(oldest_gen);
The result was the same, same segfault in the same place. So I decided to do it a little stronger and avoid the blocks getting the
BF_MARKED flag, so that they are not even marked.
if (!(bd->flags & BF_FRAGMENTED))
// bd->flags = BF_MARKED;
This one worked. The code in
sweep() is not commented, but it s irrelevant, since it ignores blocks that don t have this flag. My third try was to comment the places where the
BF_MARKED flag is read, to see each one was causing the segfault. I got a list of places to search with grep, and there weren t a lot of them.
$ grep BF_MARKED rts/sm/*.c
rts/sm/Compact.c: if (bd->flags & BF_MARKED)
rts/sm/Evac.c: if ((bd->flags & (BF_LARGE BF_MARKED BF_EVACUATED)) != 0)
rts/sm/Evac.c: if (bd->flags & BF_MARKED)
rts/sm/GCAux.c: if ((bd->flags & BF_MARKED) && is_marked((P_)q,bd))
rts/sm/GC.c: if (!(bd->flags & BF_MARKED))
rts/sm/GC.c: // time, so reset the BF_MARKED flags.
rts/sm/GC.c: // compact. (search for BF_MARKED above).
rts/sm/GC.c: bd->flags &= ~BF_MARKED;
rts/sm/GC.c: // Also at this point we set the BF_MARKED flag
rts/sm/GC.c: // BF_MARKED is always unset, except during GC
rts/sm/GC.c: bd->flags = BF_MARKED;
rts/sm/Sweep.c: if (!(bd->flags & BF_MARKED))
The first one is in
rts/sm/Compact.c, so it s not relevant to the use with
-w. The second one, in
rts/sm/Evac.c, is a bit indirect.
if ((bd->flags & (BF_LARGE BF_MARKED BF_EVACUATED)) != 0)
// pointer into to-space: just return it. It might be a pointer
// into a generation that we aren't collecting (> N), or it
// might just be a pointer into to-space. The latter doesn't
// happen often, but allowing it makes certain things a bit
// easier; e.g. scavenging an object is idempotent, so it's OK to
// have an object on the mutable list multiple times.
if (bd->flags & BF_EVACUATED)
// We aren't copying this object, so we have to check
// whether it is already in the target generation. (this is
// the write barrier).
if (bd->gen < gct->evac_gen)
gct->failed_to_evac = rtsTrue;
TICK_GC_FAILED_PROMOTION();
return;
/* evacuate large objects by re-linking them onto a different list.
*/
if (bd->flags & BF_LARGE)
info = get_itbl(q);
if (info->type == TSO &&
((StgTSO *)q)->what_next == ThreadRelocated)
q = (StgClosure *)((StgTSO *)q)->_link;
*p = q;
goto loop;
evacuate_large((P_)q);
return;
/* If the object is in a gen that we're compacting, then we
* need to use an alternative evacuate procedure.
*/
if (!is_marked((P_)q,bd))
mark((P_)q,bd);
push_mark_stack((P_)q);
return;
The first if, in line 466, is executed if any of the three flags is present:
BF_LARGE,
BF_MARKED or
BF_EVACUATED. The second if, in line 474, checks for
BF_EVACUATED, and returns. The third if, in line 487, checks for
BF_LARGE and returns. The code in lines 502-505 is only executed if
BF_MARKED is present, and not the other ones. I tried commenting this code, and got an assertion fail in the user code, so I think this is not a good path to follow.
The second occurence of
BF_MARKED is in the same file.
if (bd->flags & BF_MARKED)
// must call evacuate() to mark this closure if evac==rtsTrue
*q = (StgClosure *)p;
if (evac) evacuate(q);
unchain_thunk_selectors(prev_thunk_selector, (StgClosure *)p);
return;
Commenting it, with or without the call to
sweep() commented, causes the same segfault. So, I m tending to think this part of the code is unrelated to the issue.
The third occurence is in
rts/sm/GCAux.c. I tried commenting it with all the four combinations of the two others commented and not commented, and all resulted in the segfault in the same place.
There s another place to check in
rts/sm/GC.c. Again, commenting it made no difference. The last one is part of
sweep(), so it s avoided anyway when the call to this function is commented.